Document image analysis and recognition: a survey

نویسندگان

چکیده

This paper analyzes the problems of document image recognition and existing solutions. Document algorithms have been studied for quite a long time, but despite this, currently, topic is relevant research continues, as evidenced by large number associated publications reviews. However, most these works reviews are devoted to individual tasks. In this review, entire set methods, approaches, necessary considered. A preliminary systematization allowed us distinguish groups methods extracting information from documents different types: single-page multi-page, with text handwritten contents, fixed template flexible structure, digitalized via ways: scanning, photographing, video recording. Here, we consider analysis applied wide range tasks: identification verification identity, due diligence, machine learning algorithms, questionnaires, audits. The single page examined: classical computer vision i.e., keypoints, local feature descriptors, Fast Hough Transforms, binarization, modern neural network models boundary detection, classification, structure analysis, blocks tables localization, extraction details, post-processing results. review provides description publicly available experimental data packages training testing algorithms. Methods optimizing performance described.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on Recognition and Analysis of Handwritten Document

Handwriting differs from person to person. Some may be legible while some others are difficult to read or understand. Hence this project aims at recognizing the handwritten text and understanding what it is with the help of a neural network and fuzzy logic. It involves segmentation, feature extraction and classification.Here the method used is Canny Edge Detection Algorithm and the Histogram Of...

متن کامل

Digital Libraries and Document Image Analysis Techniques: a Survey

Nowadays, Digital Libraries have become a widely used service to store and share both digital born documents and digital versions of works stored by traditional libraries. Document images are intrinsically non-structured and the structure and semantic of the digitized documents is in most part lost during the conversion. Several techniques related to the Document Image Analysis research area ha...

متن کامل

A Survey on Document Image Analysis and Retrieval System

The digitization of documents and their availability over the network demands solution toward content based document image analysis, indexing, searching and retrieval. Signature, Logo and Layout of the documents present convincing evidence and provide an important form of indexing for effective document image retrieval in a variety of applications. This paper describes methods and techniques de...

متن کامل

Document Analysis and Recognition

The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early st...

متن کامل

Document image processing: graphics recognition

Document analysis or processing is mainly related to texts and graphics. It concerns separation, localisation and recognition. According to Nagy (2000), document analysis is related to document image analysis (DIA) since the overall research works have been concerned with document image interpretation. In a similar manner, Kasturi et al. (2002) categorises document image analysis into two domai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computer Optics

سال: 2022

ISSN: ['2412-6179', '0134-2452']

DOI: https://doi.org/10.18287/2412-6179-co-1020